The Journal of Technology, Learning, and Assessment 


Volume 8, Number 2 ■ January 201 0 


On the Roles of External 
Knowledge Representations 
in Assessment Design 


Robert J. Mislevy, John T. Behrens, 

Randy E. Bennett, Sarah F. Demark, 
Dennis C. Frezzo, Roy Fevy, 

Daniel H. Robinson, Daisy Wise Rutstein, 
Valerie J. Shute, Ken Stanley, 

& Fielding I. Winters 


www.jtla.org 


A publication of the Technology and Assessment Study Collaborative 
Caroline A. & Peter S. Lynch School of Education, Boston College 



JTLA 

Volume 8, Number 2 

On the Roles of External Knowledge Representations in 
Assessment Design 

Robert J. Mislevy, John T. Behrens, Randy E. Bennett, Sarah F. Demark, 

Dennis C. Frezzo, Roy Levy, Daniel H. Robinson, Daisy Wise Rutstein, 

Valerie J. Shute, Ken Stanley, Fielding I. Winters 

Editor: Michael Russell 
russelmh@bc.edu 

Technology and Assessment Study Collaborative 
Lynch School of Education, Boston College 
Chestnut Hill, MA 02467 

Copy Editor: Jennifer Higgins 
Design: Thomas Hoffmann 
Layout: Aimee Levy 

JTLA is a free online journal, published by the Technology and Assessment Study 
Collaborative, Caroline A. & Peter S. Lynch School of Education, Boston College. 

Copyright ©2010 by the Journal of Technology, Learning, and Assessment 
(ISSN 1540-2525). 

Permission is hereby granted to copy any article provided that the Journal of Technology, 
Learning, and Assessment is credited and copies are not sold. 


Preferred citation: 

Mislevy, R.J., Behrens, J.T., Bennett, R.E., Demark, S.F., Frezzo, D.C., Levy, R., 
Robinson, D.H., Rutstein, D.W., Shute, V.J., Stanley, K., & Winters, F.I. (2010). 
On the Roles of External Knowledge Representations in Assessment Design. 
Journal of Technology, Learning, and Assessment, 8(2). Retrieved [date] from 
http://www.jtla.org. 



Abstract: 


JTL A 


People use external knowledge representations (KRs) to create, identify, depict, trans- 
form, store, share, and archive information. Learning to work with KRs is central to 
becoming proficient in virtually every discipline. As such, KRs play central roles in cur- 
riculum, instruction, and assessment. We describe five key roles of KRs in assessment: 

1. An assessment is itself a KR, which makes explicit the knowledge that is 
valued, ways it is used, and standards of good work. 

2. The analysis of any domain in which learning is to be assessed must include 
the identification and analysis of the KRs in that domain. 

3. Assessment tasks can be structured around the knowledge, relationships, 
and uses of domain KRs. 

4. “Design KRs” can be created to organize knowledge about a domain in 
forms that support the design of assessment. 

5. KRs in the discipline of assessment design can guide and structure 
domain analyses (re #2), task construction (re #3), and the creation and 
use of design KRs (re #4). 

The third and fourth roles are developed in greater detail, through an “evidence-centered” 
design perspective that reflects the fifth role. Recurring implications of technology that 
leverage the impact of KRs in assessment are highlighted, including task design supports 
and automated task construction and scoring. Ideas are illustrated with “generate exam- 
ples” tasks and simulation tasks for computer network design and troubleshooting. 
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Introduction 

Knowledge representation is a central theme in cognitive psychology. 
Internal knowledge representation refers to the way that information 
about the world is represented in our brains, and as such lies at the center 
of learning, interacting, and problem-solving of all kinds. This paper con- 
cerns external forms of knowledge representation. An external knowledge 
representation (abbreviated KR below), or inscription (Lehrer & Schauble, 
2002), is a physical or conceptual structure that depicts entities and rela- 
tionships in some domain, in a way that can be shared among different indi- 
viduals or the same individual at different points in time. KRs are human 
inventions that overcome obstacles to human information processing with 
respect to working memory limitations, faulty long-term memory over 
time and in volume, coordinating actions across individuals, and providing 
common ways of thinking about some phenomenon of shared interest. 
Examples of KRs include maps, lists, graphs, wiring diagrams, bus sched- 
ules, musical notation, mathematical formulas, object models for business 
systems, and the 7-layer OSI model for computer networks. 
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This paper considers the roles of KRs in educational assessment, with 
an eye toward making the activities of assessment design more explicit, 
more valid, and more efficient. A red thread highlighting the implications 
of technology runs through the discussion. Technological developments 
make KRs possible that are more interactive, support automated trans- 
formations, and enable collaboration in ways that are transforming the 
practice of assessment — “assessment engineering,” to use Luecht’s (2002, 
2007) term. We aim to bring to the surface the interplay among psychology 
(through the lens of KRs), technology, and assessment theory upon which 
this transformation is grounded. 

The following section provides a brief review of important features of 
KRs. Five roles of KRs in assessment are then outlined. We note how KRs 
connect expertise with learning and assessment in a domain, and hence 
shape both instructional and assessment design. We then further develop 
and illustrate two of these roles, namely the design of assessment tasks 
around domain KRs and the creation of special KRs that help the assess- 
ment designer accomplish this. We place this discussion in the context of 
evidence-centered assessment design (ECD; Mislevy, Steinberg, & Almond, 
2003; Mislevy & Haertel, 2006) to take advantage of KRs emerging from 
that work. 

The ideas are illustrated with examples from three assessment projects. 
A relatively simple example based on Butterfield et al. (1985) concerning 
inductive reasoning tasks is interleaved through the discussion. Two 
more-complex examples are discussed in greater detail later in the paper. 
They concern a “generating examples” task type developed at Educational 
Testing Service (Bennett et al., 1999; Bennett, Morley, 8c Quardt, 2000; 
Katz, Lipps, 8c Trafton, 2002) and Cisco Systems’ computer network simu- 
lation (CNS) assessments of design and troubleshooting (Behrens et al., 
2004, Frezzo 8c Stanley, 2005, Williamson et al., 2004). 
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Knowledge Representations in Assessment 

KRs play a central role in human cognition, as a means of identifying, 
expressing, communicating, and utilizing information in social spheres. 
Generally speaking, KRs are a vehicle for discourse, used either by a single 
individual (mediated cognition) or among individuals (distributed cogni- 
tion), at one point in time or across multiple time points. They concern 
entities, relationships, and processes in some domain, and their organiza- 
tional form is used to create, gather, store, transform, and use information 
more easily than would be accomplished without them. Markman’s (1999, 
pp. 5-8) definition of a KR has four components: 

• A represented world: The domain that the representations are 
about. The represented world might be the world outside the 
cognitive system or some other set of representations inside the 
system. That is, one set of representations can be about another 
set of representations. 

• A representing world: The domain that contains the 
representations. (The terms “represented world” and 
“representing world” come from a classic paper by Palmer, 1978.) 

• Representing rules: The representing world is related to the 
represented world through a set of rules that map elements of 
the represented world to elements of the representing world. 

• A process that uses the representation: It makes no sense to 
talk about representations in the absence of processes. The 
combination of the first three components (a represented world, 
a representing world, and a set of representing rules) creates 
merely the potential for representation. Only when there is also 
a process that uses the representation does the system actually 
represent, and the capabilities of a system are defined only when 
there is both a representation and a process. Increasingly, as we 
will see in the case of assessment, these processes can be carried 
out digitally as well as perceptually, cognitively, or mechanically, 
as has been the case historically. 
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Some KRs, such as mathematical notation and computer languages, 
gain their power through symbol manipulation. After information has 
been encoded in the required form, operations can be carried out on the 
symbols to transform or combine the information in ways that would 
be difficult or impossible for a human to do unaided. A quotation from 
Whitehead (1911) is a propos: 

By relieving the brain of all unnecessary work, a good notation sets it 
free to concentrate on more advanced problems, and, in effect, increases 
the mental power of the race. ... Civilisation [sic] advances by extending 
the number of important operations which we can perform without 
thinking about them. (pp. 59,61) 

Other KRs, such as graphs and maps, encode information in ways that 
capitalize on humans’ strengths in recognizing patterns and interpreting 
spatial relationships (see, for example, Lewandowsky & Behrens, 1999, on 
statistical graphs and maps) : 

The greatest possibilities of visual display lie in vividness and 
inescapability of the intended message. A visual display can stop your 
mental flow in its tracks, and make you think. A visual display can 
force you to notice what you never expected to see. One should see the 
intended at once; one should not even have to wait for it to appear 
(Tukey, 1990, p. 367). 

Many KRs use both symbolic and perceptual representation in varying 
mixtures (e.g., Tufte, 1990). A table exploits spatial arrangement to com- 
municate the relevance of the organizing concepts of rows and columns for 
the subject of each cell (Mosenthal & Kirsch, 1989). Technology extends 
the power of KRs in several respects. Interactivity, as in working through 
a wizard to complete a tax form, and collaboration over a distance, as in 
online meeting workspaces that share computer applications, are two 
familiar examples. Digital KRs are particularly amenable to automated 
symbol manipulation, in ways and at speeds that far outstrip unaided 
human cognition. A central problem in human-computer interaction is 
developing and tuning the KRs through which people interact with com- 
puters to exploit these capabilities. 

Properties of Knowledge Representations 

Several properties of KRs are relevant to their roles in assessment. One 
of the most important is that a KR does not attempt to include every- 
thing in the represented world, only certain entities and relationships. It 
highlights those entities and relationships, and facilitates thinking about 
them, talking about them, and working with them. This is the ontology of 
the KR. Unrepresented aspects of the represented world are considered 
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irrelevant. The velocity of a falling body is represented by v 0 + g t, whether 
the body is a cannonball or a feather, whether it is falling in Austin or 
Tokyo. The breadth of applicability of KRs can be a strength. It is also a 
potential weakness in application when what is omitted from the mapping 
is important in the real-world situation, as when the velocity of a falling 
feather is lower because of air resistance. While carrying out reasoning 
within the representing world is important in learning to use KRs, it is just 
as important to learn when to apply them gainfully and how to recognize a 
potentially hazardous misfit (a central topic in statistics, for example; e.g., 
Belsley, Kuh, & Welch, 1980, on diagnostics in regression analysis). 

In addition to focusing on only certain aspects of situations in the 
represented world, KRs are optimized for certain uses regarding those 
aspects. A domain of any complexity typically has many KRs, each tuned 
to different relationships and purposes. For example, matrix algebra, path 
diagrams, and computer code input are all used to represent factor anal- 
ysis and structural equation models (SEMs) in psychometrics (Figure 1 
shows two different representations of the same factor analysis model). 
The matrix equations admit to symbol-manipulation procedures for taking 
derivatives, which support algorithms for finding the values of the vari- 
ables that fit the data best — finding maxima of multivariable likelihood 
functions is not something people do well in their heads. But graphical 
representations have advantages at the model-building stage, because the 
qualitative relationships among variables are immediately apparent and 
rapidly specified. Computer programs such as EQS (Bentler, 2006) allow 
the user to specify a model by working with a graphical interface, then 
generate code automatically to estimate the parameters with algorithms 
derived under the algebraic representation. 

Note the essential role of technology in this process: The user working 
with a graphical interface is using an interactive computer-based represen- 
tation that facilitates spatial thinking about relationships among variables; 
the computer representation is a digital encoding of the sequence of drags, 
drops, clicks, and typed characters; the computer program transforms this 
digital representation of user actions into another digital representation 
that would correspond in turn to an algebraic expression, upon which to 
carry out mathematical operations. The outcomes of these operations are 
re-expressed as human-friendly KRs such as graphs and tables of results, 
including human-accessible traces of processing such as changes in the 
log likelihood function at each cycle of an iterative process on computer- 
friendly KRs. We will see later in the CNS example how similar algorithmic 
conversions from one knowledge form to another provide advantages in 
computer-based assessment systems for domain analysis, task authoring, 
task presentation, interaction with the examinee, and automated scoring. 
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Figure 1 : Matrix Algebra and Path Diagram Representations of a Factor 

Analysis Model 
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This attunement of KRs to different processes and purposes explains 
the presence of multiple KRs in a given domain (Ainsworth, 1999). Multiple 
KRs also occur when the complexities of real-world situations lend them- 
selves to modeling at different levels or from different perspectives. In 
transmission genetics, for example, there are KRs for expressing relation- 
ships at the levels of species, individuals, cells, and molecules. Although 
each KR highlights entities and relationships at a certain level of analysis, 
relationships and constraints can cross levels and representational forms 
as well. The similarities of elements’ chemical properties in a column of 
Mendeleev’s periodic table correspond to similarities in electron shell 
diagrams. Translating information from one form to another is often a 
target of learning in content domains, as the process of solving a problem 
can take the form of a sequence of transformations within and between 
models, mediated by operations carried out with KRs. 

One can speak of KRs at various levels of generality. For example, the 
elements and representational capabilities of Cartesian graphs and their 
attendant elements can be addressed at a general level, to display kinds of 
relationships that can be used to represent among variables in any domain. 
Certain knowledge associated with graphs can thus be learned and used 
(and assessed) across domains. Scatter plots in statistics and acceleration 
graphs in physics are both special cases of Cartesian graphs that can be 
studied in their own right, as patterns in graphs correspond to more spe- 
cialized representations such as acceleration formulas, which are in turn 
grounded in the generative principles of that particular domain. 
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KRs have value because people can do things with them. Well-chosen 
KRs incorporate subtle and hard-won insights into a form that can be 
applied mechanically. Fifty years ago, an economist could win a Nobel 
prize for generating and solving from first principles the kinds of systems 
of linear equations that EQS users can apply today without knowing either 
calculus or matrix algebra. It is an advantage of a KR that a user can exploit 
deep principles without knowing them explicitly. To enjoy these benefits, 
however, the user must become attuned to the ways the KR offers to create, 
display, or transform information — its affordances, to use Gibson’s (1966) 
term. The problem of designing KRs to best communicate information and 
affordances receives both practical and academic attention in fields such as 
graphics (e.g., Pinker, 1990) and human-computer interfaces (e.g., Card, 
Moran, & Newell, 1983). This research is prompted in part by the fact 
that KRs that can be expressed in symbolic form support multiple views 
and automated transformations. For example, CNS works back and forth 
between perceptual KRs for presenting and capturing information from 
examinees and symbolic KRs for evaluating their work and transforming 
information from one form to another. 

How do KRs facilitate work? By focusing on recurrent patterns at a 
level above the particulars of any problem, KRs facilitate analogies across 
problems and domains. They make it easier to acquire and structure infor- 
mation. They coordinate work in projects that are so large or complex that 
no one can know all the details of all their facets. In such cases, KRs such as 
Gantt charts and object models help people understand their roles and con- 
nect their work with that of others. They provide a common language for 
people to express information and work with it in ways that tacitly incor- 
porate experience from other times and other people. The form of a KR 
can indicate when information is missing. For example, representing text 
information in a matrix graphic organizer rather than text makes missing 
information more salient (Figure 2, next page). KRs such as blueprints, 
agendas, schedules, to-do lists are significant in planning, because they 
indicate what information is needed, how it is to be acted upon, and what a 
solution will look like. Collins and Fergusen (1993) emphasize that people 
can create new knowledge by using KRs by referring to them as “epistemic 
forms,” and the ways that people use them as “epistemic games.” 
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Figure 2: What Information is Presented About Moths but Not About 

Butterflies? The Missing Element is Easier to See From the Matrix 
Organizer than in the Text 


Moths and Butterflies (text) 

A moth has two sets of wings. It folds the wings down over its body 
when it rests. The moth has feathery antennae and spins a fuzzy cocoon. 
The moth goes through four stages of development. 

A butterfly also goes through four stages of development and has 
two sets of wings. Its antennae, however, are long and thin with knobs at 
the ends. When a butterfly rests, its wings are straight up like outstretched 
hands. 


Moths and Butterflies (matrix organizer) 
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Wings 

Two sets 

Two sets 

Rest 

Wings over body 

Wings outstretched 

Antennae 

Feathery 

Long, thin, with knobs 

Cocoon 

Fuzzy 

— 

Development 

Four stages 

Four stages 


Roles of Knowledge Representations in Assessment 

Looking at educational assessment through the lens of KRs reveals 
their presence throughout the enterprise, at different stages, at different 
levels, and with different purposes. The following sections discuss five key 
roles that KRs play in assessment: 

1. An assessment is in itself a KR, which makes explicit the 
knowledge that is valued, ways it is used, and standards of good 
work. 

2. The analysis of any domain in which learning is to be assessed 
must include the identification and analysis of the KRs in that 
domain (that is, the “domain KRs”). 

3. Assessment tasks can be structured around the knowledge, 
relationships, and uses of domain KRs. 

4. “Design KRs” can be created to organize knowledge about a 
domain (including its domain KRs) in forms that support the 
design of instruction and assessment. 


JT-LA 
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5 . KRs from the disciplines of instructional design and assessment 
design can guide and structure the domain analyses noted in 
(2), the task construction noted in (3), and the creation and use 
of design KRs noted in (4). 

Assessments Are Themselves Knowledge Representations 

The analogy of assessment to measurement is vital to its conduct, but 
it is not sufficient. A student taking an assessment is engaged in a form 
of socially construed discourse (Gitomer & Steinberg, 1999), no less than 
a teenager playing a video game or a taxpayer completing an IRS 1040 
form. This observation holds implications for assessment designers and 
students alike. Designers must always be aware that an assessment con- 
stitutes the most direct statement of the knowledge and skills that are 
valued, in effect if not in intention. The process of constructing an assess- 
ment, done thoughtfully, elicits an understanding of the knowledge that 
is targeted, the actions of students that provide evidence about it, and 
the circumstances under which that knowledge should be brought to bear 
(Wiggins, 1998). An assessment is a KR that communicates the targets 
of learning and the standards of performances to all stakeholders, and 
its construction serves educative purposes before the first examinee ever 
sees it. 

In order to perform well in an assessment, students must not only have 
become facile with the targeted knowledge and skills, but they must also 
be able to work with them in the forms and under the conditions that char- 
acterize the assessment situation. That is, the students must be attuned to 
the affordances of the assessment as a form of KR. The more complex an 
assessment is, in terms of the embedded KRs students will interact with 
and the standards by which KRs that students produce will be evaluated, 
the more important it is to ensure that this attunement has taken place 
before the assessment begins. For students attempting to solve an inter- 
active chemistry investigation with an unfamiliar computer interface, 
the interface can present more difficulties than the chemistry. Similarly, 
students cannot “explain” a solution to a mathematics problem until they 
understand the nature, the forms, and the expectations of exposition that 
are required to produce a “satisfactory explanation.” 

Identifying the Knowledge Representations of a Domain 

Becoming an expert in a domain is a process of learning about the 
nature of knowledge in the domain, including terms, principles, patterns, 
and exemplars, and the nature of interaction among those who participate 
in that domain (Ericsson, 1996). The kinds of knowledge highlighted under 
both an acquisition metaphor and a participation metaphor (Sfrad, 1998) 
are required. KRs play central roles in both. KRs embody the important 
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ideas and relationships in a domain, organize them so that they are the 
vehicle for doing work in the domain, define the language by which people 
acquire and communicate information in that domain, and coordinate the 
interactions of people as they work toward common ends. It is not much 
of an understatement to say that learning in a domain is learning to use 
the KRs of the domain — the domain KRs, as we call them here. 

No analysis of a learning domain can be complete without an investi- 
gation of the KRs that are used in the domain and the situations in which 
they are used. Learning materials such as textbooks and exemplars are a 
natural place to begin, but the selection of KRs used in instruction can be 
biased toward “academic” KRs. Additional KRs used in practical work, per- 
haps informal or embedded in tools, are also part of the targeted domain, 
and learning how and when to use them is part of the targeted learning. 

Structuring Tasks Around Domain Knowledge Representations 

Assessment is reasoning about what students know, can do, or have 
accomplished more broadly, from evidence in the form of a relative handful 
of particular things they say, do, or make in particular situations. The situ- 
ations in which the student is to act are defined in no small part through 
KRs. The various KRs that constitute an assessment task provide informa- 
tion about a situation to the student, suggest the nature of the problem, 
suggest the terms in which the problem is to be approached, offer clues as 
to the nature of a solution and the criteria of evaluation, and provide affor- 
dances for getting started. This is as true of open-ended performances or 
portfolios as it is of objective tests consisting of multiple-choice items. 
Furthermore, what the student says, does, or makes in response — the 
work products — are typically structured in terms of the KRs of the domain 
as well. Scalise and Gifford (2006) describe how, in technology-supported 
environments, having examinees complete or construct representations 
not only increases fidelity to the domain, but facilitates construct-driven 
automated scoring. Indeed, it is increasingly common, especially in simu- 
lation-based tasks like the CNS tasks, that complex interactive KRs consti- 
tute the environment in which the examinee thinks and acts. 

Research on expertise reveals increasing expertise in the use of domain 
KRs as proficiency increases, in ways that hold implications for designing 
tasks and evaluating performances. As a first example, Kindfield’s (1999) 
study of experts’ and novices’ use of diagrams to reason through genetics 
problems revealed an interesting reversal: Novices’ drawings were often 
more complete and better proportioned than experts’, but what distin- 
guished experts’ diagrams was that only the salient features tended to be 
shown, and the relationships important to the problem at hand were ren- 
dered with whatever accuracy was needed to solve the problem. That is, 
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the experts’ diagrams were more efficacious than those of the novices. As 
a second example, Cameron et al. (2000) found increasing proficiency in 
dental hygienists at increasing levels of experience with respect to their use 
of KRs such as radiographs, hard and soft tissue charts, and probing depth 
charts. Early stages of learning were marked by the ability to identify and 
interpret key features on a given single representation. Expert hygienists 
were distinguished from recently licensed hygienists by a superior ability 
to integrate information across multiple representations of different types, 
effectively constructing a model of a patient about whom all the represen- 
tations were different, yet coherent, views of the same person. 

A central idea for assessment design, and a central topic of the CNS 
example, is that a systematic analysis of the KRs in a domain — what they 
are, their features, and how people use them — is a foundation for prin- 
cipled generation of assessment tasks. An understanding of the entities 
and relationships of each KR and the relationships among them is devel- 
oped in conjunction with an understanding of the kinds of reasoning 
or actions that one wants students to carry out using the KRs. The out- 
comes of this analysis lay the groundwork for schemas of tasks that focus 
on valued work in the domain in explicit ways, and exist at some level of 
generality above particular tasks. The level of generality of the KRs and 
the resulting schemas depends on the intended use, with the usual under- 
standing that broad applicability of general forms trades off against the 
power of specific forms. These task construction schemas can themselves 
be expressed in terms of KRs. Hively, Patterson, and Page’s (1968) item 
shells and Haladyna and Shindoll’s (1989) item forms represented initial 
research along these lines, while more recent technology-based task con- 
struction frameworks include those of Bejar et al. (2003), Gierl, Zhou, and 
Alves (2008), and Mislevy et al. (2003). 

At this point, we introduce an example from Butterfield et al. (1985) 
concerning theory-based generation of letter series tasks, a measure of 
inductive reasoning (Thurstone & Thurstone, 1941). Here are two exam- 
ples based on the Primary Mental Abilities test battery (Thurstone & 
Thurstone, 1962): 


Fill in the next letters in the series: 

CDCDCD 


AT B AT A AT B AT 
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This KR is an example of an item type — a particular kind of KR used in 
assessment to present information to an examinee and set expectations 
for a response. This particular KR consists of a series of symbols, read from 
left to right, arranged according to a pattern, or rule, that both explains 
the appearance of the symbols that are depicted and sets expectations for 
the symbols that would come next. The student’s task is to determine the 
rule and make predictions. The blanks are affordances — the natural place 
to write the symbols that extend the pattern if you understand what the 
KR is about, but mysteries if you do not. Although these items require no 
specialized content knowledge other than the alphabet, they reflect the 
kind of reasoning required in more-complex inductive problems that do 
require more substantive knowledge, such as scientific inquiry. Because 
this is the representational form that the student works with, it is the 
domain KR in our first assessment example. 

Representations for Designing Assessments in Given Domains 

Advantages can be gained when the characteristics of the KRs can 
themselves be represented in higher-level KRs that are devised to serve 
the purposes of assessment design. We may call these “design KRs.” Design 
KRs are related to domain KRs, but they are built for the purpose of gener- 
ating domain KRs to be used in tasks. They describe salient features of task 
situations, in ways that both imply domain representations and indicate 
the kinds of reasoning and knowledge that the student will need to call 
upon. We shall see that the same representations can provide information 
to KRs used in other stages of assessment design and delivery, such as task 
selection and psychometric modeling (Bejar, 2002; Embretson, 1998). 

Butterfield et al. (1985) created a design KR for the domain of letter 
series tasks described in the previous section. Letter series tasks had been 
used at least as early as Thurstone’s research in 1941, in both practical 
applications and psychometric research. Task generation was idiosyncratic, 
however, and systematic examinations of both the structure of tasks and 
how people solve them were lacking (Butterfield). Simon and Kotovsky 
(1963) devised a symbol system to describe such tasks after they have been 
written, and their analysis is Butterfield’s starting point for a KR that sup- 
ports automated task generation in this domain. An abbreviated version 
and a few examples of the design KR for letter series rules convey the key 
ideas: Letter series tasks are composed of one or more strings of letters. 
Within a string, special relationships hold for moving from one letter to 
the next, such as identity (I), next letter (N), and back a letter (B). A rule 
is expressed by the relationships of letters within a string, and the strings’ 
relationships to one another. The rule underlying the series CDCDCD is 
denoted by II 12, instantiated with C and D as the initial values of the first 
and second strings. The same rule instantiated with R and T as the initial 
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values yields RTRTRT. The series MABMBCMCD is expressed as II 12 N2, 
with initial values M and A. 

This design KR for expressing rules is obviously distinct from letter 
series tasks themselves, but they are related in ways that serve the purposes 
of the assessment designer. A rule expressed in the design KR grammar and 
initial string values suffices to produce a letter series task. Operations can 
be defined on rules expressed in the grammar of the KR to address issues 
of form, such as when two rules produce identical series. Other operations 
on rules address psychological issues such as memory load, as a function 
of calculable properties such as “Counts = # moving strings * (period - # 
adjacent identity relations).” Related operations can be used to address 
psychometric issues such as task difficulty (as in Embretson, 1998). The 
design KR for letter series tasks, therefore, has pragmatic connections to 
the task authoring, psychological argument, and measurement modeling 
layers of the assessment enterprise. 

An early example of generative design KRs is Hively, Patterson, 8c Page’s 
(1968) idea of “item forms” for generating whole number arithmetic items, 
two of which appear as Figure 3 (next page). Another example appears in 
Bormuth’s (1970) work on generating “wh” questions from text. The KR is 
a syntactic representation of one or more propositions, which is amenable 
to symbolic transformations that yield questions that can be used to assess 
basic comprehension. Both of these examples provided KRs that enabled 
an assessment designer to map the structures and content of domain KRs 
(arithmetic items and English text) into more-abstract KRs that support 
transformations into tasks. The “generating examples” and CNS examples 
in later sections illustrate more recent work, in which the capability of 
computers to carry out symbol manipulation is exploited more fully in 
the automated construction of tasks through technology-based design 
KRs. At the time Bormuth (1970) introduced the “generating questions” 
approach mentioned above, for example, tasks were generated algorith- 
mically but needed to be constructed by hand; few applications were car- 
ried out (Roid 8f Finn, 1977, describes one such application). With current 
natural-language processing capabilities, it would be a simple matter to 
construct “wh” questions from English text automatically. 
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Figure 3: Two "Item Forms" from Hively, Patterson, and Page (1968) 


Descriptive 

Title 

Sample 

Item 

General 

Form 

Generation Rules 

Basic fact; 

13 

A 

1. A=1a ; B=b 

Minuend > 10 

-6 

-B 

2. (a<b) e U 

3. {H, V} 

Borrow across 

403 

A 

1. # digits = {3,4} 

zero 

-138 

-B 

2. A=a 1 a 2 ...; B=b 1 b 2 ... 

3. (a^b,), (a 3 <b 3 ), (a 4 >b 4 ), e U 0 

4. b 2 £ U 0 

5. a 2 = 0 

6. P{{1,2,3},{4}} 


Capital letters represent numerals, lower case represent digits, 
x E { — } means chose x with replacement from the set. 

U = {1,2,... ,9}; U 0 ={0, 1 , ,9}. 


Knowledge Representations in the Discipline of Assessment Design 

As long as assessment has been practiced, KRs have been developed to 
aid designers. Familiar examples include the aforementioned item types 
and item forms, test specifications (Davidson & Lynch, 2001, for a recent 
in-depth discussion), and content-by-process matrices often based on 
Bloom’s (1956) taxonomy of educational objectives. These KRs are used 
to help designers generate items and assemble test forms. KRs used in 
the analysis of test data are also familiar, from the symbolic representa- 
tions used in psychometric models to innovative displays used to sum- 
marize patterns in performance for students and their teachers. Schemas 
for rubrics to evaluate open-ended task performances are also widely used, 
allowing an assessor (such as a classroom teacher) to adapt a tested evalua- 
tion procedure to locally customized tasks; a number of tools are available 
in interactive formats on the Internet. Wiggins (1998) offers designers of 
performance assessment a number of templates and flowcharts, all with 
an eye toward connecting what is assessed with the goals of instruction. 

Designing assessments of any complexity involves considerations at 
many levels: substantively grounded evidentiary arguments, design of 
operational elements such as tasks and scoring models, implementing the 
design in terms of specific tasks, and all the operational activities involved 
in actually carrying out the assessment. No single KR can encompass all 
this work; multiple, coordinated representations are required. Developing 
frameworks for assessment design, complete with a conceptual rationale 
and multiple supporting KRs, has been a focus of research in the assess- 
ment community in recent years (e.g., Almond, Steinberg 8c Mislevy, 2002; 
Embretson, 1998; Luecht, 2002, 2007, Wilson, 2005). The next section 
discusses one such approach in greater detail. 
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A Closer Look at KRs and Assessment Design 

Evidence-centered assessment design (ECD) is a process of assessment 
design that involves gathering, organizing, and transforming information 
in a variety of representational forms, within the framework of a clearly 
articulated assessment argument. Under the ECD framework, KRs are 
integral at every step in the process of developing and using an assess- 
ment. This section starts with a brief overview of ECD and then, through 
this perspective, discusses and provides examples of KRs in assessment 
design. 

A Brief Overview of Evidence-Centered 
Assessment Design 

Central ideas in ECD are the assessment argument, layers of the assess- 
ment, and the role of KRs in designing and implementing assessments. 
Messick (1994, p. 16) concisely lays out the key aspects of an assessment 
argument by asking “what complex of knowledge, skills, or other attri- 
butes should be assessed? Next, what behaviors or performances should 
reveal those constructs, and what tasks or situations should elicit those 
behaviors?” All of the many terms, concepts, representations, and struc- 
tures in ECD are aimed at constructing a coherent assessment argument 
and building machinery to implement it. 

Adapting a “layers” metaphor from architecture and software engi- 
neering, ECD organizes the design process in terms of the following 
layers: domain analysis, domain modeling, conceptual assessment frame- 
work, assessment implementation, and assessment delivery (Mislevy & 
Riconscente, 2006). The fundamental work in assessment design can be 
viewed as creating, transforming, and using information in the form of 
KRs within and between these layers. Table 1 (next page) summarizes 
these layers in terms of their roles, key entities (for example, concepts and 
building-blocks), and the KRs that assist in achieving each layer’s purpose. 
The layering suggests a sequential design process, but cycles of iteration 
and refinement across layers are the norm. 
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Table 1 : Layers of Evidence-Centered Design 


Layer 

Role 

Key Entities 

Examples of Knowledge 
Representations 

Domain 

Analysis 

Gather substantive 
information about the 
domain of interest that 
has direct implications 
for assessment: how 
knowledge is constructed, 
acquired, used, and 
communicated. 

Domain concepts, 
terminology, 
tools, knowledge 
representations, analyses, 
situations of use, patterns 
of interaction. 

Content standards, 
concept maps (e.g., Atlas 
of Science Literacy, AAAS, 
2001). Representational 
forms and symbol systems 
of domain of interest, e.g., 
maps, algebraic notation, 
computer interfaces. 

Domain 

Modeling 

Express assessment 
argument in narrative 
form based on 
information from domain 
analysis. 

Knowledge, skills, and 
abilities; characteristic 
and variable task features; 
potential work products 
and observations. 

Assessment argument 
diagrams, design patterns, 
content-by-process 
matrices. 

Conceptual 

Assessment 

Framework 

Express assessment 
argument in structures 
and specifications for 
tasks and tests, evaluation 
procedures, measurement 
models. 

Student, evidence, and 
task models; student 
model, observable, and 
task model variables; 
rubrics; measurement 
models; test assembly 
specifications. 

Test specifications; 
algebraic & graphical 
KRs of measurement 
models; task template; 
item generation 
models; generic rubrics; 
automated scoring code. 

Assessment 

Implement assessment, 
including presentation- 
ready tasks, scoring 
guides or automated 
evaluation procedures, 
and calibrated 
measurement models. 

Task materials (including 
all materials, tools, 
affordances); pilot 
test data for honing 
evaluation procedures 
and fitting measurement 
models. 

Coded algorithms to 
render tasks, interact 
with examinees, evaluate 
work products; tasks 
as displayed; IMS/ 

QTI representation of 
materials; ASCII files of 
parameters. 

Assessment 

Delivery 

Coordinate interactions 
of students and tasks: 
task-and test-level 
scoring; reporting. 

Tasks as presented; 
work products as created; 
scores as evaluated. 

Renderings of materials; 
numerical and graphical 
score summaries; IMS/QTI 
results files. 


The first layer in the process of designing an assessment, domain anal- 
ysis, lays the foundation for later layers by defining the knowledge, skills, 
and abilities (KSAs) that assessment users want to make inferences about, 
the student behaviors they can base their inferences on, and the situations 
that will elicit those behaviors. A critical part of domain analysis includes 
identification of KRs important to the domain, because expertise in a 
domain necessarily includes knowledge of and understanding of how and 
when to use the KRs in that domain. 

At the next layer, domain modeling, KRs within the domain of assess- 
ment design come into play in the form of assessment argument diagrams 
(Bachman, 2003, Mislevy, 2003, 2006; see Figure 4 for the basic struc- 
ture, adapted from Toulmin, 1958), content-by-process matrices, and the 
design patterns that will be discussed in more depth in the next section. 
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Using these KRs, domain modeling structures the outcomes of domain 
analysis in a form that reflects the structure of an assessment argument, 
in order to ground the more technical student, evidence, and task models 
that are required in the subsequent Conceptual Assessment Framework 
(CAF) layer. 

Figure 4: An Assessment Argument Diagram 



The conceptual assessment framework (CAF) concerns the technical 
specifications for the materials and processes that embody assessments. 
The central models in the CAF are the student model, the evidence model, 
and the task model (Figure 5, page 22). In addition, the assembly model 
governs how tasks are assembled into tests, a presentation model indi- 
cates the requirements for interaction with a student (for example, simu- 
lator requirements), and the delivery model specifies requirements for the 
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operational setting. An assessment argument laid out in narrative form at 
the domain-modeling layer is here expressed in terms of specifications for 
tasks, measurement models, scoring methods, and delivery requirements. 
Details about task features, measurement-model parameters, stimulus 
material specifications, and the like are expressed in terms of KRs and 
data structures that we will say more about later in this section, which 
guide their implementation and ensure their coordination. 

With information from the models in the CAF, delivery of an assess- 
ment from an ECD perspective is defined by a four-process architecture 
(Figure 6, next page). Starting in the upper left corner of Figure 6, the 
activity selection process selects a task (tasks include items, sets of items, 
or other activities) and directs the presentation process for display to the 
examinee. When the examinee has finished interacting with the item, the 
results (a work product ) are sent to response processing. Information from 
the task model defined in the CAF provides the basis for the presentation 
process and work product specifications. From information outlined in the 
evaluation model of the CAF, the response process identifies essential obser- 
vations about the results and passes them to the summary scoring process, 
which updates the scoring record about the examinee. The scoring record 
describes knowledge about the student-model variables articulated in the 
student model of the CAF. All four processes add information to the results 
database. The activity selection process again makes a decision about what 
to do next, based on the current scoring record of the participant or other 
criteria. 

The preceding brief outline is not sufficient to explain the roles and 
interplay of the processes, or the way that this structure supports the 
design of technology-based assessment tasks and delivery systems; the 
reader is referred to Almond, Sternberg, & Mislevy (2002). What is impor- 
tant for this presentation is that every message that passes from one pro- 
cess to another is expressed in terms of some KR. It has been produced by 
the sender, be it a human or computer, and provided in a form that the 
receiver, again a human or a computer, can use to carry out some other 
function essential to the operation of the assessment. The following sec- 
tions provide examples. 
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Figure 5: The Central Models of the Conceptual Assessment Framework (CAF) 




Assembly Model 




v J 




Delivery Model 


Figure 6: The Four Principle Processes in the Assessment Cycle 

Activity 

Selection Presentation 
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Domain Reasoning, Knowledge Representations, and Task Design 

The ECD process affirms the idea that analysis of the KRs central to a 
given domain is integral to generating assessment tasks in that domain. 
Essential to this idea is the connection between a given domain KR itself, 
reasoning in the domain, and the way people use it in practice. This is 
critical because the knowledge needed to use a domain KR in a particular 
circumstance is often what we want to draw inferences about. Identifying 
and articulating the relationship between using specific KRs in particular 
situations and the type of knowledge elicited is an important link in the 
assessment design process. Identification of these relationships during the 
domain analysis process sets up the construction of arguments in domain 
modeling, which in turn sets up the creation of schemas for designing 
tasks. 

Butterfield et al.’s (1985) letter series example provides an example 
of the interplay between KRs and knowledge. In this example, the KR, a 
pattern of letters, provides a way for both task designers and examinees to 
reason about the underlying pattern. In essence, this KR allows for assess- 
ment of the inductive reasoning ability of the test-taker; the KR structure 
itself becomes a tool for assessing this knowledge. 

Checklists and behavioral inventories are examples of KRs that have 
long been used to ground licensure and certification tests. As epistemic 
forms, they provide structure to the job analyst’s task of identifying the 
nature and frequency of tasks professionals carry out, from which assess- 
ment tasks will be devised. 

More recent work in cognitive task analysis addresses the nature, orga- 
nization, and use of knowledge that tasks employ (Schraagen, Chipman 
8f Shalin, 2000). This allows for distinctions between different types of 
knowledge and skills that one may want to evoke from an examinee, 
including declarative, procedural, or strategic knowledge, which may all 
be associated with one particular domain KR. The information is collected 
during the domain analysis phase of the assessment design. For example, 
Shute, Torreano, and Willis’s (2000) automated knowledge elicitation tool 
DNA (Decompose, Network, Assess) provides structured, user-friendly 
web forms to elicit domain experts’ input on declarative, procedural, and 
conceptual-knowledge requirements of common tasks in the domain. The 
DNA tool is an interactive design KR, capitalizing on technology and the 
wizard metaphor to elicit and structure domain information from subject- 
matter experts, and to store it in digital forms that can be transformed to 
support domain modeling, the next step in the design process. 

In addition to the argument schema shown in Figure 4 (page 20), 
another KR that has been developed for work in the domain modeling layer 
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is the design pattern (Mislevy et al., 2003). Design patterns encapsulate 
knowledge about ways to address assessment challenges that recur across 
domains or within particular domains, organized in categories that con- 
nect to elements of an assessment argument on the one hand, and point 
ahead toward the more technical elements of the CAF. For example, Table 
2 shows selected portions of a design pattern for problem-solving in finite 
systems, a valued skill in both everyday life (why won’t this door close?) and 
in technical domains such as aircraft repair, computer programming, and 
the troubleshooting of computer networks addressed by the CNS tasks. 
A design pattern for this particular skill can be utilized across domains 
because it capitalizes on similar patterns of problem-solving reasoning in 
each. Within any given domain, multiple design patterns can be used to 
target the knowledge, skills and abilities of interest — such as building a 
teamwork task around troubleshooting, working-in-groups, and self-mon- 
itoring design patterns. 

Table 2: Portions of a Design Pattern for Problem-Solving in Finite Systems 


Summary 

Students are presented a problem of determining the state of a system, and 
methods for gathering information about its state. No available diagnostic 
procedure is definitive; each rules in some possibilities and rules out others. 

Rationale 

Integrated knowledge structures, characteristic of effective problem 
solvers, are displayed in the ability to represent a problem, select and 
execute goal-directed strategies, monitor and adjust performance, and 
offer complete, coherent explanations. 

In particular, problem-solving to determine the state of a finite system with a 
set of tests requires an understanding of the procedures that can be applied 
to rule sets of states in or out, being able to interpret the results of the tests, 
synthesizing their information to determine what states are still possible 
after a series of tests, and being able to choose a next test that will effectively 
narrow the search space. 

Focal knowledge, 
skills, and abilities 

• Ability to apply knowledge of system and component functioning to 
solve a problem. 

• Ability to generate and elaborate explanations of task-relevant concepts. 

• Ability to build a mental model or representation of a problem to 
guide solution. 

• Ability to devise and manage problem-solving procedure. 

Additional 
knowledge, skills, 
and abilities 

• Domain knowledge. 

• Capability to carry out tests. 

• Ability to coordinated problem-solving with others (if required). 

Characteristic task 
features 

• Statement of problem provides system, initial conditions, and set of 
test procedures. 

• System with imperfectly known state (e.g., fault, unknown components). 

• There is a finite (though possibly large) space of possibilities of the 
system state. 

• Each test procedure rules some aspects of system state in and others out. 


(continued on the next page) 
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Table 2: Portions of a Design Pattern for Problem-Solving in Finite Systems 

(continued) 


Variable task 
features 

• Level and nature of content knowledge required to solve problem. 

• Degree of domain familiarity required. 

• What is the fault(s)? 

• Fault simple, compound, intermittent? 

• Complexity of system to troubleshoot. 

• Degree of scaffolding or prompting. 

• Individual work, work with a partner, or as a member of a group? 

• Number of diagnostic procedures to choose from. 

• Redundant diagnostic procedures? 

• Overlapping diagnostic procedures? 

Potential observable 
variables 

• Correctness of solution. 

• Quality of evidence to support conclusions. 

• Quality of explanation of task-specific concepts. 

• Adequacy of problem representation or problem-solving plan. 

• Appropriateness of solution strategies. 

• Frequency and flexibility of self-monitoring. 

• Efficiency of solution. 

• Accuracy of deductions at each step. 

Potential work 
products 

• Written or verbal description/identification of where the problem is or what 
the solution is to the problem. 

• Illustration of problem solution and/or written justification for"Here is how 
1 know." 

• Verbal or written description of anticipated problem-solving approach. 

• Verbal or written explanation of task-specific concepts. 

• Log or observation of student actions. 

• Observation data/log-file/think-aloud protocols during solution. 

• Indication of possibilities are ruled in or out by a given test procedure. 

• Indication of which possibilities are ruled in or out by all test procedures 
given thus far, at any point during the solution. 


J-T-L-A 


On the Roles of External Knowledge Representations in Assessment Design 


Mislevy et al. 


The design pattern structure can be used to address the type of pro- 
ficiencies that people employ when using domain KRs. For example, in 
model-based reasoning an initial model, usually expressed in the form of 
a KR, is created and iteratively revised as it is tested in real-world situa- 
tions (Stewart & Hafner, 1994). The Architectural Registry Examination 
(ARE) (Bejar & Braun, 1999) utilizes this type of reasoning with a com- 
puter-aided design (CAD) system that has examinees produce a domain 
KR in the form of a site plan. At each step in this iterative process, exam- 
inees react to and modify their design based on their previous designs and 
remaining constraints for the design (Katz, 1994). The steps examinees 
take in this process (all, it may be noted, within the technology-based 
simulation environment that is itself an interactive KR) become a critical 
aspect of assessing their level of expertise in architectural design. 

Thus, the design pattern KR serves first as an epistemic form to syn- 
thesize experience and analysis of classes of valued work in ways that will 
support assessment design. It is then a source of information for the task 
author creating such specific tasks or task models for a specific context. 
It provides grounding for the validity of tasks created in this manner by 
making explicit the link between the features, requirements, and evalua- 
tion procedures of a task and the knowledge and skills that are valued in 
the domain (Bennett & Bejar, 1998). 

While the sample design pattern illustrated in Table 2 is a static form, 
affordances provided by technology have been employed to facilitate their 
construction by geographically dispersed design teams and their interac- 
tive use by task authors. That is, the usefulness and efficiency of design 
patterns as a KR has been leveraged by embedding them in digital form, 
and taking advantage of technological affordances to help people build 
them and use them. The form in which design patterns are created is an 
object model that can be built by a dispersed team in real time over the 
Internet using a collaborative virtual work space (Hamel & Shank, 2005). 
A “writer-friendly” online version of the design pattern structure presents 
item writers with a concise summary version of the pattern but allows 
them to follow links for additional discussion and examples of the various 
attribute entries, and to highlight entries from different attribute cate- 
gories that are related to one another with regard to task design choices 
(Mislevy & Liu, 2009). 

Knowledge Representations for Creating, Presenting, and Scoring Tasks 

After the evidentiary argument has been defined at the domain analysis 
and domain modeling layers, the next layers focus attention on structuring 
and generating actual tasks. These are the CAF layer, in which student, 
evidence, and task models are articulated, and the Implementation layer, 
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which includes task generation. This section notes roles that KRs play in 
these processes. 

Task Creation 

In domain analysis, the designer identifies situations in which practi- 
tioners in a domain use the KSAs of interest, and on this basis in domain 
modeling the designer frames, in the KR of design patterns, paradigmatic 
situations to elicit those KSAs (recall Table 1). In the CAF, more detailed 
task models are created. A task model is a design KR that structures the 
authoring of the actual tasks that will be presented to the student. It 
describes the environment in which students will act to provide the data 
necessary to make inferences about KSAs, including the domain KRs that 
will be used to provide information to the examinees and to serve as work 
spaces and tools for them, and in which they will express the products and 
processes of their work. The values of the task model variables identified in 
a task model provide specifications such as the form of the work product, 
the materials necessary, and other features of the setting, all of which are 
grounded in the original assessment argument and play a variety of roles 
in task construction, presentation, scoring, and interpretation of results 
(Mislevy et al., 2002). 

Figure 7 (next page) shows a schematic diagram of the relationship 
between the task model variables (on the right-hand side) and the assess- 
ment implementation and delivery process . The task model variables, which 
in this example include the language in which the task will be presented, 
inform the task design as well as the evidence portion of the process. As 
described in Mislevy et al. (2002), these attributes in the task model KR 
provide information for KRs used in task authoring, task selection, auto- 
mated scoring, psychometric modeling, and score reporting. 

A task model, then, is a design KR that includes details about how the 
information the tasks elicit is related to other components of the assess- 
ment. The task model also explicates what particular features are neces- 
sary to include and which are variable, or optional. This general idea has 
been embodied in a variety of particular forms. For illustration we use 
here the task template (Riconscente, Mislevy, & Hamel, 2005) developed in 
the Principled Assessment Design for Inquiry (PADI) project to describe 
task models more specifically. Task authors can use the template as a blue- 
print to create actual tasks that are grounded in the original assessment 
argument, without needing to reconstruct this reasoning. As an example, 
Figure 8 (page 29) shows an example of a PADI task template for BioKIDS, 
a project that helps students learn science inquiry (Gotwals 8c Songer, 
2006). As can be seen in this example, the template lays out the student 
and measurement models in conjunction with the task model. Further, 
the template articulates particular materials, activities, and tools associ- 
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ated with the task template. In this way, the task template is connected 
to the chain of reasoning that occurs at the domain analysis and domain 
modeling layers. 

Figure 7: Schematic Showing the Roles of Task Model Variables 
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Figure 8: A BioKIDS Template in PADI Design System 



BioKIDS - multidimFive | Template 1070 E Car, vm a .T nk Scis Ckihuts Ctiiu: 


Title: 

[Edit] 

BioKIDS - multidimFive 

Summary 

[bid] 

This s a task specification for the entire BicKIDS test, assuming a multidimensional student 
model with 2 SMVs. 

Type 

a [Edit! 

t Vi+n 1 (Madifisd 2004-09-25] 

Student Model 

Summary 

O .as: 

Inquiry (Explorations, interpreting data, making 
hypotheses./ prediction-,) 4- Content (Biodiversity) 

Student Models 

a [E4ii: 

EteflDSJ Piincpsism. E-odivcrsity 



Hypothesis 

Building Explanation from Evidence 
Reexp-essirg Data 

Measurement 
Model Summary 

a :ui] 

16 items have MMs which vary: some are dichotomous 
multiple choice models, others are bundles with both MC and 
open-ended models 

Evaluation 

Procedures 

Summary 

a [ids! 

Multiple choice items are dichotomous (0 ■incorrect; 1 -correct) 
Open ended items are scored on a partial credit model (usuelly 
a 0-1-2 scale). 

eundles are indicated where several student work products are 
dependent on ore another. 

Work Product 
Summary 

a ti't : 

Some multiple choice (4-5 options) 

Some open-ended construction of answers to given questions 

Task Model 

Variable 

Summary 

a :s4s: 


Template level 
Task Model 
Variables 

a :sm: 

Amount of eceffoldina. The task car auide students to think about certain concepts or can help 
students structure their ans 

8? anMrlftnittnili- 


Amount of Data . The number of data points presented to students in graphs, tables and maps. 
Content atea . Specific domain content under consider alien 

Content knowledge required (simple. mpd.comcTex) . ihs vanafcle represents the amount cf 
content knowledge needed to bring to the task in order to sol. 

Data 3cprr-.cntntior Format . T>»e format of data as it is presorted to students (bar graph, Ime 
graph, scatter plct, map, data ta 


Task Model 

Variable 

Settings 

O .Edit: 

IXtosJ 

Presentation 

Requirements 

O 'ids! 


I emplate-level 
Materials and 
Presentation 

a sis'. 


Presentation 

Settings 

O :sis: 

[Xiasl 

Activities 

Summary 

a [Edit! 

One activity per item because, for a bundled item, the activity 
helps associate the MM with the proper Eval Procedure m a 
way that the Gradebook can discern. 

Activities 

O [Edit] 

aia.^i^S arfi/Bflsttgal asiiyiLY mullitijnfiyg (all HMa)- 

Tools for 
Examinee 

a [Edit: 

Paper and pendl/pen This test is entirely written 
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Another advantage of task models as design KRs, beyond ensured 
instantiation of the assessment argument at the task level, is their poten- 
tial for guiding the reusability and adaptability of tasks to different forms 
or assessments. Hively et al.’s “item forms” provide an early example of this 
type of design KR. Item forms and item models provide item-level tem- 
plates that can be adapted to a number of different assessments through 
changes in task features. Such templates allow the assessment designer 
the flexibility of adapting particular item types or tasks without losing the 
connection to the original assessment argument. This provides both effi- 
ciency and validity in task creation. Continuing with the example from 
the Butterfield et al. (1985) letter-series task, one can imagine using an 
“item form” approach whereby particular features of the letter-series task 
change (for example, letters, pattern) to create distinct items assessing the 
same reasoning. 

When a task model is in digital form and the slots are appropriately 
filled, the resulting form can serve as input to subsequent processes to 
create tasks in the forms in which they are needed in implementation and 
presentation (as in Bejar et al., 2003, Gierl et al., 2008, Hamel, Mislevy, 8c 
Winters, 2008, and Hamel 8c Schank, 2006). Two examples of programs 
that can facilitate task authoring using the idea of item templates are 
Mathematics Test Creation Assistant (TCA, Singley 8c Bennett, 2002) and 
the Free-Response Authoring, Delivery, and Scoring System (FRADSS, 
Katz, 1995). Both of these tools allow for creation of multiple items from 
particular item models or item objects that are at a more general level of 
abstraction. Like PADI task templates, item forms and models support effi- 
ciency in their potential for reusability, as well as validity in their connec- 
tion to the assessment argument laid out in the domain analysis phase. 

KRs play an important role in the decisions that are made about the 
environment around the task. For example, choice of the format (for 
example, paper and pencil or computer-based; multiple-choice or diagram 
with essay) and the materials (for example, physical manipulatives) will 
all be shaped by the KRs that are critical to the domain, as identified in 
the domain analysis phase and carried through to the task template. This 
aspect of task authoring is discussed in further depth in the next section 
on task presentation. 

Task Presentation 

KRs are important for task presentation in several ways. First, the 
tasks themselves can be considered KRs. They are designed, based on the 
assessment argument, to be KRs that examinees must respond or react 
to in some manner, producing a work product that will be subsequently 
evaluated. Most often, a task employs important domain KRs to achieve 
this. Mathematics tasks use diagrams and mathematical notation, social 
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studies tasks use maps and graphs, and music tests use musical notation. 
The CNS utilizes symbols of network systems to assess examinees’ under- 
standing of network troubleshooting. Thus, the presentation of the tasks 
in this environment necessarily includes KR symbols, formats, and manip- 
ulations that the test-taker must be able to understand and use. 

An example of a task as the examinee experiences it (in contrast to the 
task object KR, in the IMS/QTI xml form that the presentation process 
uses to render this view) is depicted in Figure 9. This screen shot is of a 
task from the Full Option Science System (FOSS) project, in which science 
phenomena are simulated in a computer environment. For this particular 
example, examinees are asked to interact with the symbols on the screen 
to simulate electrical circuits. A number of domain KRs are present in this 
example, such as the battery and switch. As a technology-based KR itself, 
the simulation environment affords interaction to the examinees so that 
the real-world implications of their actions with the simulated components 
can be visualized. In this way, the KRs have been tuned both to cognition 
in the domain and to the elements of an assessment argument. 

Figure 9: Prompt from FOSS/ASK Simulation 


jj| Q Inquiry Assessment 

Hello, 

You did mis activity already, but here it is if you want to do it again. 



I) Explore the system. 

See hoar changing the variables changes the outcome. 
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Decisions regarding what stimulus materials, resources, and levels 
of scaffolding will be provided to examinees are all described in the task 
model. These decisions are often affected by the type of work product that 
is derived for a particular task. With the FOSS example, the work products 
produced for this item are similar in form to many others from the tasks 
created with the same template. 

Just as specifications for particular tasks are articulated in the task 
model, the presentation model provides specifications for rendering the 
task in a particular environment. For example, a presentation model for a 
computer-based assessment will be different from one for a paper-based 
test, even though the two might have identical task and evidence models. 
This flexibility is yet another example of the way in which the ECD approach 
enables adaptability and reusability of tasks. 

Finally, design KRs also play a role in facilitating presentation of tasks 
across the various aspects of assessment delivery. For example, the IMS 
Question and Test Interoperability (QTI) specification is an assessment 
KR that allows for interchange of information between authoring tools, 
item banks, test construction systems, and assessment delivery systems. 
In this way, the QTI aids in creating and presenting tasks more efficiently, 
by providing a shared language for KRs that are used and produced in com- 
puter-based assessment (Almond, Steinberg, & Mislevy, 2002). 

Task Scoring 

Articulating the student model requires specifying the student-model 
variables. Each student-model variable corresponds to some aspect of 
knowledge, skill, ability, or proficiency, presumed to drive probabili- 
ties of observable responses. They will be the variables in a latent vari- 
able model such as an IRT, latent class, cognitive diagnosis, or Bayes net 
model. Psychometric models such as these use probability-based methods 
to ground inferences about students. From the perspective of ECD, the 
student model and the measurement submodel of the evidence model are 
KRs that support probability-based reasoning about examinees based on 
evaluations of their performances. Structured around recurring eviden- 
tiary themes, measurement model fragments can be fit together flexibly 
for different problems and different kinds of data (Conati, Gertner, & 
VanLehn, 2002; Mislevy, 2006; Rupp, 2002). Being able to automatically 
assemble probability models in light of purposes and evolving conditions, 
as in simulation-based assessment, is an example of what engineers call 
“knowledge-based model construction” (Breese, Goldman, & Wellman, 
1994). Its implementation depends on developing KRs that encode key 
features of situations to guide the assembly of the measurement model 
and student model KRs. 
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The evaluative submodel of the evidence model involves identifying and 
evaluating features of the examinee work product, in terms of values for 
the observable variables that are used by the measurement submodel to 
update the values of student model variables. We have discussed how what 
examinees say, do, or make to provide evidence in assessments is often 
expressed in terms of domain KRs, which examinees create, complete, 
transform, or interrelate — this leveraging of domain KRs being central to 
proficiency in the domain of interest. Students produce these response 
KRs in their interactions with the presentation process. They constitute 
the message passed to the evidence evaluation process. 

What is important here from the perspective of representation is that 
the form of the work product, as a KR, can be tuned to identifying and 
evaluating the features that convey evidence about the examinee’s profi- 
ciencies. The work product KR must capture traces of the cognitive pro- 
cesses that produced it, no matter whether the evaluation is carried out 
by humans or automatically (Messick, 1994). Taking advantage of devel- 
opments in technology to evaluate performances requires attention not 
just to the form of the work product KR and the procedures to be carried 
out, but also to virtually every link in the chain of reasoning that com- 
prises the assessment argument (Bennett & Bejar, 1998). To this end, the 
Williamson et al. (2006)-edited volume Automated Scoring of Complex Tasks 
in Computer-Based Testing contains chapters describing various method- 
ologies for automated scoring of KRs from performance assessments from 
the perspective of ECD. In a later section, we discuss automated scoring 
procedures used in CNS tasks, which adapt ideas from both the rule-based 
algorithms for scoring the log of patient management problems in the 
National Board of Medical Examiners’ Primum assessment (Margolis 8f 
Clauser, 2006) and the natural language processing techniques used in 
automated scoring of essays (Deane, 2006). 

The KR of multiple-choice response format revolutionized testing 
first when it was introduced in the early decades of the twentieth century, 
because it virtually eliminated judgment in evaluation, and then again in 
the middle of the twentieth century when machine-based scoring of mul- 
tiple-choice items made standardized testing economical at vastly larger 
scales. Current work focuses on the use of more ecologically valid KRs as 
work products; that is, examinees’ performance in directly constructing, 
completing, or transforming domain KRs. To accomplish this objective 
economically requires KRs that in one view the examinee can interact with, 
but that in another view support both customizable automated evaluation 
procedures and flexible reuse across assessment domains and purposes. 
The key to successful automated scoring is the articulation of the cognitive 
psychology underlying the use of the domain KRs, which determines how 
assessment design and implementation KRs are structured and processed 
to provide the necessary evidence in the assessment argument. 
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"Generating Examples" Tasks 

This section looks more closely at an innovative task family for use 
in large-scale testing, through the lens of KRs. Bennett, Steffen, Singley, 
Morley, and Jacquemin (1997) developed the mathematical expres- 
sion (ME) response type that allows presentation of any item for which 
the answer is a rational symbolic expression. It was created primarily to 
present mathematical modeling problems such as the following: 


A normal line to a curve at a point is a line perpendicular to the 
tangent line at the point. The equation of the normal line to the 
curve y = 2x 2 at the point (1,2) is given by . 


Such questions typically describe a situation in one representational 
form (verbal), which the examinee must then translate to a symbolic form 
more suitable for mathematical procedures. Translating between alterna- 
tive representations is key to success in any technical field. In most applied 
fields — mathematics, engineering, architecture, and computer program- 
ming are good examples — a key activity is to translate the verbally stated 
requirements of a client to the representational forms of the field, because 
it is those representational forms that can be more effectively and effi- 
ciently operated on to satisfy client requirements (Larkin & Simon, 1987). 
This notion of translating verbal into more graphic or pictorial KRs is also 
consistent with research demonstrating the advantages of having students 
construct graphic organizers or concept maps from text (e.g., Lambiotte, 
Dansereau, Cross, 8f Reynolds, 1989; Robinson, 1998). 

In addition to using the ability to translate between KRs as the object 
of measurement, how this response type uses KRs in scoring is of interest. 
One of the attractions of ME items is that they have no single correct 
answer. Rather, there can be many — perhaps an infinite number — of cor- 
rect answers because there are numerous ways to express the same math- 
ematical relationship. For ME, examinee responses always share the same 
basic KR, a mathematical expression. However, correct responses will 
almost certainly vary in their surface features. Thus, the scoring challenge 
is one of mathematical paraphrase. For example, in field trials, the following 
were among the correct responses examinees produced for the preceding 
problem: 
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-1/4X+9/4 

(-1 *x+ 9)/4 

-1/4 *x + (9/4) 

1/4*(9-x) 

-x/4 + 9/4 

(— x +9)/4 

-.25X+2.25 

(9-x)/4 

2 - 1/4 * (x-1) 


To score answers automatically, each response is compared against a 
key expression, where that key expression can be any paraphrase of a correct 
answer. The comparison is done by substituting values in the examinee’s 
expression, evaluating it, substituting the same values in the key expres- 
sion, evaluating it, and subtracting one expression from the other. If the 
result is repeatedly zero (that is, across many different substitutions), the 
examinee response is considered to be correct. ME scoring works, then, 
by manipulating KRs. It does nothing more than compare the contents 
of the examinee’s KR to a representation expressed in the same symbol 
system, which might differ in its surface configuration but, if the response 
is right, not in semantics. Although examiners have evaluated answers for 
value rather than expression for centuries, the capability of manipulating 
algebraic expressions digitally enables designers to employ open-ended 
responses as work products in this representational form in large-scale 
tests. 

Bennett et al. (1999) also developed the “generating examples” (GE) 
response type in which problems present constraints but do not present 
enough information to determine the answer uniquely, and ask exam- 
inees to pose one or more instances that meet those constraints. GE ques- 
tions thus relax the problem structure, although unlike Simon’s (1978) 
“ill-structured” problems, GE items give enough information to determine 
whether a posed solution is a member of the universe of correct responses. 
And, unlike ME, this universe is not composed of only paraphrases but 
also includes quantitatively different responses. 

The following is a sample item: 


If n and m are positive integers and 1 1 n - 7m = 1 , what are two different 
possible sets of values for n and m? 


The GE item class overlaps with the ME class. That is, we can pose GE 
items for which the work product is a constructed algebraic expression. 
That expression can take many quantitatively different forms and each of 
those forms can, in turn, have many paraphrases. Neither the paraphrases 
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nor the quantitatively different forms may be completely specifiable in 
advance. 

The GE response type can also accommodate other representational 
forms including numbers, letter patterns, graphs, or geometric figures (see 
Bennett, Morley, Quardt, & Rock, 2000). From the perspective of KRs, GE 
can be used to pose a problem in one representational form (for example, 
verbal) and collect a response in another (for example, symbolic, numeric, 
figural). But in contrast to ME, GE scores responses using a KR that differs 
from the examinee’s production. This KR is an executable key— computer 
code that tests each examinee response against the constraints expressed 
in the item stem. Thus, the executable key is nothing more than an alter- 
native KR of the problem statement, optimized for use by a computer. 

For the sample item, the executable key would essentially check each 
response to see if it: 

• Contained two pairs of values, 

• Had a second pair different from the first, 

• Had each member of each pair be a positive integer, 

• Returned for the first pair a true result when its values are 
substituted for n and m in the equation, 11 n - 7m = 1, and 

• Returned for the second pair a true result when its values are 
substituted for n and m in the equation, 11 n - 7m = 1. 

For this question, then, multiple KRs are in play. The examinee works 
with verbal and symbolic representations in translating the problem, and 
then with symbolic and numerical ones in formulating a response. The 
scoring works with the numerical response and its own logical representa- 
tion to process that response. 
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Cisco Network Simulator (CNS) Performance 
Assessments 

The Cisco Networking Academy Program (CNAP; http://cisco.netacad. 
net) is a public-private partnership that teaches apprentice-level design, 
installation, and troubleshooting of computer networks in more than 

50.000 locations (“academies”) throughout the globe. Since its inception, 
CNAP has employed hands-on, instructor-administered performance 
(skills) examinations. When well administered, these exams constitute a 
“gold standard” for assessing proficiency in the program. With more than 

10.000 instructors and little local control, however, their reliability and 
validity can vary substantially from one site to another. The web-based 
CNS provides all academies with high-quality simulation-based per- 
formance assessment to complement local hands-on exams (Frezzo & 
Stanley, 2005). The CNS tasks discussed in the following sections grew out 
of research of the NetPass project (Behrens et al., 2004, Williamson et al., 
2004), which produced the initial versions of the presentation process and 
automated scoring procedures. This section considers the roles of KRs in 
the development and use of CNS tasks. The interpenetrating roles of tech- 
nology, cognition, and assessment design theory are clear throughout the 
discussion. 

The CNS Assessment as a Knowledge Representation 

The CNS assessment is itself a KR, which coordinates information 
about the curriculum and instruction that occurs in the Cisco Networking 
Academy Program, expert-novice studies on design and troubleshooting 
(Williamson et al., 2004), and research on assessment design in order to 
provide evidence about student proficiency at the end of the program. 
Figure 10 shows the web page that the students taking the CNS exam see 
as they work. This page contains a title, instructions for submitting the 
assessment, a timer, and tabs that link to key domain KRs that will be 
discussed further in the following section. The affordances that appear on 
the web page were designed to mirror other tools that students have used, 
including real networking devices. 

The “assessment as knowledge representation” of CNS is of paramount 
importance in CNAP. The widely varying quality of skills assessments 
across thousands of academies meant that instructional goals and perfor- 
mance expectations were not being clearly communicated to instructors 
and students. CNS was seen as a cost-effective way to use technology to 
provide this information widely, and to provide students with opportuni- 
ties to work through the cognitive aspects of design, configurations, and 
troubleshooting with CNS learning tasks as well as summative exams. 
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Figure 1 0: The CNS Examinee Interface 



Domain Analysis of KRs in Networking 

Subject matter experts analyzed CNAP curriculum materials to survey 
the KRs used in instructional materials and in real-world problems at the 
targeted level of skills. They found usage of both general-purpose KRs, 
such as tables and graphs, populated with networking information and 
KRs that were particular to the domain. 

One example of a critical domain KR in the domain of computer net- 
working, and thus for CNS, is the logical topology representation. The 
logical topology is an abstracted map of the networking device nodes 
and the interconnections between those nodes (Frezzo & Stanley, 2005). 
Figure 11 shows an example of a logical topology, with icons representing 
PCs and icons representing routers. Two other domain KRs are shown at 
the bottom of this figure: the command-line interface (CLI), which allows 
students to interact with the virtual routers, and Cisco’s Internetwork 
Operating System (IOS), which is the control and programming language 
for networking the switches and routers in the logical topology KR. Both 
are inherently interactive and technology-based as people use them in 
actual practice, and as examinees need to use them in CNS tasks. As an 
aspect of knowledge about the domain, students are expected to be able 
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to understand each type of KR and how the representations interact to 
describe a given network — that is, what each representation tells one 
about the network and what it does not, where the representations share 
information in different forms and must therefore be consistent, and how 
each representation supports different aspects of reasoning about the net- 
work when troubleshooting. 

Figure 1 1 : Two Key Domain KRs, the Logical Topology (top) and the Cisco IOS 

Command-line Interface (CLI) (bottom) 





1 
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Structuring Tasks around Domain KRs 

KRs play a central role in assessment in determining the context in 
which students will provide evidence of their knowledge, skills, and abili- 
ties, which includes knowledge and proficiencies with domain KRs. CNS 
network configuration tasks illustrate the interactions between a student 
and the delivery system in the presentation, creation, and transformation 
of KRs. 

The initial presentation of the problem to the student takes the form 
of domain KRs, in the form of verbal descriptions using networking ter- 
minology and concepts (Figure 10, previous page)), a logical topology dia- 
gram (upper window in Figure 11), and a CLI for configuring the devices 
in the network (lower window in Figure 11). The student uses the CLI to 
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configure the network devices by means of the Cisco IOS control language, 
which is a symbol-system KR through which humans and network devices 
communicate with other devices. We note the fidelity of the CNS configu- 
ration tasks to real-world device configuration: The CNS environment uses 
the same Cisco IOS language and the same CLI interface as when config- 
uring real devices remotely from a terminal, and the simulator provides 
the same messages back as real devices would. This correspondence, made 
possible by the simulation environment, supports the construct-represen- 
tation line of argumentation for the validity of these tasks (Embretson, 
1983). 

As the student proceeds, two new KRs are created and others are 
transformed. The KRs that are transformed are the representations of the 
devices inside the simulator. These are symbol system KRs as well, rep- 
resenting the state of each hypothetical network device in a digital form 
that the simulation program can use to compute device responses to com- 
municate back to the student or to modify the behavior of other devices. 
These are not KRs of the learning domain, but rather of the simulation 
domain used in the presentation and evidence identification processes in 
the assessment delivery system. They are optimized to support the pro- 
cesses of the delivery system for presentation and scoring, and are not 
visible to the student. 

The KRs that are created are called the running configuration and the 
log file. The running configuration file for a router is the result of using the 
CLI to issue commands to change the active configuration of the router 
and its traffic control behavior. Figure 12 (next page) shows an example. 
Running configuration files are of great importance in the networking 
domain, and serve as the key work product in CNS configuration tasks. As 
a work product, a running configuration file indicates the final status of 
the network when a student completes the problem. The log file addition- 
ally captures all the commands that a student issues during the course of 
the work and the responses from the network. 

Running configuration files and log files are domain KRs, produced 
by examinees as they interact with a (simulated) network system using 
the Cisco IOS symbol-system that they are learning for just this purpose. 
As work products, they are assessment KRs that can be operated on by 
the evidence identification process of the CNS delivery system to identify 
and evaluate evidence about student proficiency. The interplay between 
humans — students and instructors — and the CNS system continues in 
the automated scoring and reported processes discussed in a later section 
of this paper. 
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Figure 1 2: Router Running Configuration File Serves as a 

Student Work Product 



Using Design KRs to Support Task Creation 

Another way in which KRs played a crucial role in the development of 
the CNS is through the design KRs called design patterns, noted earlier. 
In the case of CNS, design patterns were used to create multiple forms 
to ensure exam security. Design patterns that are of interest to the CNS 
are those related to network design, implementation, and troubleshooting 
tasks (Wise, 2005). More-focused design patterns were developed from 
the Problem-Solving in a Finite System design pattern presented earlier, 
which incorporated the specialized domain knowledge and context of 
troubleshooting computer networks. 

Task shells are another KR used in CNS. CNS task shells are built around 
the specification of stimulus domain KRs, key aspects of their contents in 
terms of task model variables, and targeted KRs in terms of work prod- 
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ucts. Figure 13 is an example of the part of a task shell that test developers 
use to create instances from a family of simple network design tasks. 

Figure 1 3: Shell for CNS Design Task Problem Statement. 

Boldface Phrases are Variables 


1. Setting sentence: A(n) setting is [create something that is a 
typical activity for this setting]. 

2. Building size sentence: The setting is buildingLength long. 

3. Network type sentence: The setting has been asked to install 
a(n) EthernetStandard network for this [the typical activity 
for this setting created above] . 

4. Subgroup 1 specification: The subgroupl connections require a 
bandwidth of bandwidthForASubgroupl. 

5. Subgroup 2 specification: The subgroup2 connections require a 
bandwidth of bandwidthForASubgroup2. 

6. Subgroup 3 specification: The subgroup3 connections require a 
bandwidth of bandwidthForASubgroup3. 

7. “Force closets” sentence?: No networking equipment can be 
stored in the Subgroupsl23 area. 

8. Location of POP sentence: The link to the Internet is located 

locationOfExternalConnection(POP). 


Using KRs to Create Tasks and Manage 
Assessment Systems 

CNS has revolutionized assessment in the Cisco Networking Academy 
Program, and in turn teaching and learning, by making high-fidelity 
simulations of the cognitive aspects of the domain available at low cost 
throughout the program over the Internet. Obviously, the KRs trans- 
mitted over the Internet to and from the examinee, in terms of stimulus 
conditions, interactions with the simulated network, and work products, 
must be represented in digital form, and transformations from one form 
to another are necessary to communicate between people and computer 
processes, and between one process and another. 

Many domain KRs and design KRs, some of which are mentioned in the 
previous sections, are used in the design, implementation, and delivery of 
CNS tasks. In this section we point to two particular ways that KRs are 
used in computer-supported task design and computer-based delivery — 
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namely, task authoring and automated scoring. These leverage points 
concern the way that assessment designs can use technology to more effi- 
ciently create the domain KRs examinees interact with, and capture and 
evaluate the KRs they produce. 

As noted earlier, task shells like those used in CNS are not a new idea. 
They are a KR that has been used for decades to synthesize knowledge in 
learning domains and knowledge about assessment, to improve efficiency 
and validity. What is new is the expression of task shells in computer-based 
forms that facilitate the work of test developers by allowing them to work 
with interfaces that create task specification KRs and automated or semi- 
automated procedures that operate on these forms to generate the KRs 
used in assessment delivery. Figure 14 (next page), for example, shows a 
screen from a CNS task authoring tool in which a test developer selects 
stimulus and work product KRs for troubleshooting tasks. Having speci- 
fied that a topology diagram will be present in a task, the test developer 
then specifies and configures a network that meets the conditions indi- 
cated in the task model variables, using an interface similar to the one that 
a student uses in a design task. The output of this interaction is another 
KR, an XML file whose format can be used by the presentation process to 
display the topology diagram, and by the simulator to create the network 
and govern its behavior. 

CNS uses automated scoring procedures in the evidence identification 
process. They consist of computer programs that scan for salient features 
of the KRs produced by students’ interactions with the presentation pro- 
cess, namely configuration files, log files, and network topology XML files. 
The scoring rules for the running configuration in configuration tasks, 
for example, produce values for graded response observable variables for 
accuracy of the routing protocol, whether access control lists (ACLs) are 
assigned to appropriate devices, and the correctness of the ACL rules. Log 
files contain more information — for example, about strategy use and effi- 
ciency — but these would present greater scoring challenges because they 
can vary considerably from one student to another. The NetPass proto- 
type used logical rules to identify the presence or absence of key features 
of the interaction, systematicity of steps, and number and seriousness of 
errors (Williamson et al., 2004). Clauser et al. (1997) describe this style of 
automated scoring for interactive problem solving in simulated patient- 
management problems at the National Board of Medical Examiners. 
Viewing the interaction between an engineer and a network as a con- 
versation carried out in the Cisco IOS language, DeMark and Behrens 
(2004) took a statistical language processing approach to analyzing the log 
files, with promising results in classifying learners along a novice-to-expert 
curriculum. 
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Figure 1 4: Screen from CNS Task Authoring Interface 


Representations given to student: 

T5 config file ipartial) 
d) show command outputs 
5) photozocra 
5) eSIHs 

7) network topology diagram 

8) block diagrams 


Essential features: 

hostname 

no ip domain- lookup 
interface up and addressed correctly 
passwords for console, enable, and vty 
console settings (default settings) 
ping connectivity test 

Add | 


The resulting observable variables are a KR that is sent the reporting 
process to produce the student score report. A computer program thus 
transforms information in the form of machine-readable KRs containing 
values of observables into a KR that summarizes results on this task for 
human students and instructors. The reporting process creates an accom- 
panying KR called an item-information page (Figure 15, next page), which 
details by item how the student responded and the scoring rubric that was 
applied. 
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Figure 15: 


Item Information Page Including Student Model Variables, 
Feedback, and Work Product 
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KRs play roles in managing and coordinating the various aspects of 
building an assessment. For CNS, aspects of the curriculum, instruction, 
and assessment are intertwined around the domain and design KRs. Many 
actors, including learners, instructors, subject-matter experts, program- 
mers, psychometricians, and automated delivery processes use the KRs 
that are embodied in the assessment to interact and communicate with 
one another. Several benefits have accrued from explicating and exploiting 
the roles of KRs in assessment design (DeMark, West, & Behrens, 2005). 
These include improving alignment among curriculum, assessment, and 
instruction; providing efficiency and scalability in task and test construc- 
tion; and grounding the defensibility of tasks in high-stakes tests. 

In more recent work, assessment designers have extended these ideas 
to more local customization for instructors for learning exercises and for- 
mative assessment. A dynamic software environment called Packet Tracer 
allows instructors to create tasks and students to use and manipulate 
the multiple KRs it contains (Frezzo, 2009; Frezzo, Behrens, & Mislevy, 
2009). Figure 16 (next page) shows an example with multiple interactive 
KRs, including the logical topology and command-line interface. The cen- 
tral development team used design patterns for network design, configu- 
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Figure 16: 


ration, and troubleshooting to create sample tasks and a help system to 
assist instructors in using Packet Tracer effectively. 

Packet Tracer's Multiple Interactive KRs, Including Logical Topology, 
Cisco IOS CLI, OSI Model View, Router State Table, and Animated 
"Packet Movie" Mode 
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Conclusion 

These are exciting times in assessment, with rapid developments in 
fields that are fundamental to the conception, design, and use of educa- 
tional tests. These include statistics, measurement models, technology, 
cognitive psychology, and learning domains. The challenge is how to put 
new insights to work to improve assessment. Knowledge representation 
plays a central role in this endeavor. Two primary ways in which external 
KRs play a role in assessment can be described as domain KRs and design 
KRs. 

Domain KRs are representations that are used to express ideas and to 
carry out work in domains. They concern the what of assessment. Insights 
from the cognitive, situative, and sociocultural perspectives in psychology 
help us to understand the roles of KRs in the development of competence 
and of expertise. They are critical for understanding the domain; hence 
they are pivotal points in learning and in assessment. Learning to think 
in their terms is a target of learning; they are used in assessment to help 
define the environments that students work in and to serve as vehicles 
for carrying out the work, and as they are produced, they constitute work 
products for evaluation. Continual advances in technology mean that KRs 
are increasingly interactive and amenable to digital representations. It is 
through the psychology of using KRs and the theory of assessment design 
that we will understand how to present information, afford interaction, 
and capture work products in these forms. 

Making assessment design more efficient requires greater under- 
standing of the assessment enterprise. Recent work on “assessment engi- 
neering” (e.g., Luecht, 2002; Mislevy et al., 2003) aims not only to make 
the underlying principles explicit, but also to embed the underlying prin- 
ciples in design KRs that help assessment professionals structure, and at 
times automate, their work (Mislevy & Haertel, 2006). Assessment design 
KRs thus concern the how of assessment. They facilitate communication 
between different levels of the assessment design and provide capacity for 
reusing assessment ideas and task components. Advances in technology 
equally provide opportunities to design and deliver assessments more 
effectively. It is through improved frameworks of assessment design that 
we will understand how to create design KRs to capitalize on these oppor- 
tunities. 
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